Load Required Packages

First, we load all of required packages in this project at once. This allowed the project partners to load and work on the same versions of the packages.

1. Dataset and Introduction

We have choosen ECC dataset because it is about healthcare in what we are more professionally interested.

Objective: Early childhood caries (ECC) is a potentially severe disease affecting children all over the world [1]. The available findings are mostly based on a logistic regressionmodel, but data mining could be used to extract moreinformation from the same data set. In the paper, authors implement association rule mining for interpretability. While interpretability of the model is important, we seek other methods for classification and clustering with better performance.

Secondly, we import the training, test and validation splits of ECC datasets.

2. Descriptrive Statistics

#READ DATA
TRAIN = read.csv("./ECC_train.csv")
VALIDATION = read.csv("./ECC_validation.csv")
TEST = read.csv("./ECC_test.csv")
## 3. Classification Methods
options(knitr.kable.NA = '')
#summary of the dataset gives us the brief information.
kable(summary(TRAIN)) 
CITY CHILD_ETHNICITY CHILD_AGE CHILD_GENDER CHILD_SERBIAN_LANGUAGE MOTHER_AGE MARITAL_STATUS MOTHER_ETHNICITY MOTHER_SERBIAN_LANGUAGE NUMBER_OF_CHILDREN BIRTH_ORDER MOTHER_EDUCATION_LEVEL MOTHER_EMPLOYMENT_STATUS QUALITY_OF_HOUSING HOUSING_CONDITIONS HOUSEHOLD_MONTHLY_INCOME BIRTH_WEIGHT BREASTFEEDING BREASTFEEDING_FREQUENCY BREASTFEEDING_DURING_NIGHT BOTTLE_FEEDING INFANT_FORMULAS ADDITIONAL_FOOD_SWEETENING CHILD_FLUORIDE_SUPPLEMENTS CHILD_FLUORIDE_TOOTHPASTE CHILD_ORAL_HYGIENE CHILD_TOOTH_BRUSHING DIARRHEA_DURING_INFANCY MEDICAL_SYRUPS CHILD_FIRST_DENTIST_VISIT SWEETS_DURING_PREGNANCY FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY ORAL_HEALTH_DURING_PREGNANCY MOTHER_HEALTH_AWARENESS FATHER_HEALTH_AWARENESS ECC
NOVI_SAD :79 Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. : 1.00 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000 Min. :1 Min. : 1 Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
BACKA_PALANKA:42 1st Qu.:1.000 1st Qu.:3.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.: 1.00 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:1.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1 1st Qu.: 2 1st Qu.: 1.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.0 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
KISAC :29 Median :1.000 Median :3.00 Median :1.000 Median :1.000 Median :3.000 Median :1.000 Median : 1.00 Median :2.000 Median :2.0 Median :1.000 Median :3.000 Median :3.000 Median :2.000 Median :1.0 Median :4.000 Median :2.000 Median :2 Median : 2 Median : 1.00 Median :2.000 Median :2.000 Median :2.000 Median :3.000 Median :1.000 Median :2.000 Median :2.000 Median :2.0 Median :2.000 Median :3.000 Median :2.000 Median :3.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
RUSKI_KRSTUR :23 Mean :2.167 Mean :3.13 Mean :1.473 Mean :1.134 Mean :2.427 Mean :1.238 Mean : 22.91 Mean :1.732 Mean :1.9 Mean :1.678 Mean :3.008 Mean :2.427 Mean :1.849 Mean :1.1 Mean :3.347 Mean :1.908 Mean :2 Mean :119 Mean : 93.01 Mean :2.427 Mean :1.565 Mean :2.297 Mean :2.707 Mean :1.397 Mean :1.879 Mean :2.146 Mean :1.9 Mean :2.423 Mean :3.159 Mean :1.854 Mean :2.431 Mean :1.799 Mean :2.059 Mean :1.874 Mean :1.703
TITEL :22 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 3.00 3rd Qu.:2.000 3rd Qu.:2.0 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:1.0 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3 3rd Qu.: 3 3rd Qu.: 1.00 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
TEMERIN :17 Max. :7.000 Max. :5.00 Max. :2.000 Max. :2.000 Max. :3.000 Max. :3.000 Max. :999.00 Max. :2.000 Max. :3.0 Max. :3.000 Max. :4.000 Max. :4.000 Max. :3.000 Max. :2.0 Max. :5.000 Max. :2.000 Max. :4 Max. :999 Max. :999.00 Max. :4.000 Max. :2.000 Max. :3.000 Max. :3.000 Max. :3.000 Max. :3.000 Max. :4.000 Max. :2.0 Max. :3.000 Max. :4.000 Max. :3.000 Max. :3.000 Max. :3.000 Max. :3.000 Max. :3.000 Max. :2.000
(Other) :27
for (col in 2:ncol(TRAIN)) {
  hist(TRAIN[,col], main = paste("Histogram of", colnames(TRAIN)[col]))
}

for (col in 2:ncol(TRAIN)) {
  qqnorm(TRAIN[,col], main = paste("Normal QQ Plot of ",colnames(TRAIN)[col])); qqline(TRAIN[,col])
}

A-3. Descriptive Location Measures for Each of the Numerical Attributes

Control Tendency

  • We already achieved mean and median values of each attributes with summary() command.
  • Besides that, geometric mean is an important measure of the central tendency.
geomean = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
  geomean[col] = exp(mean(log(TRAIN[,col])))    
}
#geomean
geomean_vector <- data.frame(geomean)
row.names(geomean_vector) <- colnames(TRAIN)
kable(geomean_vector,row.names = TRUE)
geomean
CITY 0.000000
CHILD_ETHNICITY 1.758148
CHILD_AGE 3.018245
CHILD_GENDER 1.387803
CHILD_SERBIAN_LANGUAGE 1.097249
MOTHER_AGE 2.285236
MARITAL_STATUS 1.158650
MOTHER_ETHNICITY 1.887356
MOTHER_SERBIAN_LANGUAGE 1.661191
NUMBER_OF_CHILDREN 1.741823
BIRTH_ORDER 1.513556
MOTHER_EDUCATION_LEVEL 2.908949
MOTHER_EMPLOYMENT_STATUS 2.217976
QUALITY_OF_HOUSING 1.632377
HOUSING_CONDITIONS 1.072084
HOUSEHOLD_MONTHLY_INCOME 3.108337
BIRTH_WEIGHT 1.876377
BREASTFEEDING 1.754348
BREASTFEEDING_FREQUENCY 4.413374
BREASTFEEDING_DURING_NIGHT 2.084179
BOTTLE_FEEDING 2.240267
INFANT_FORMULAS 1.479237
ADDITIONAL_FOOD_SWEETENING 2.165548
CHILD_FLUORIDE_SUPPLEMENTS 2.638544
CHILD_FLUORIDE_TOOTHPASTE 1.272027
CHILD_ORAL_HYGIENE 1.799259
CHILD_TOOTH_BRUSHING 1.968312
DIARRHEA_DURING_INFANCY 1.865525
MEDICAL_SYRUPS 2.368098
CHILD_FIRST_DENTIST_VISIT 2.985174
SWEETS_DURING_PREGNANCY 1.783182
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 2.259006
ORAL_HEALTH_DURING_PREGNANCY 1.650330
MOTHER_HEALTH_AWARENESS 1.992148
FATHER_HEALTH_AWARENESS 1.783284
ECC 1.627806

Dispersion

Besides the central tendency, the fact that how closely the data fall about the center is another issue. We need to figure out the spread pattern around the center.

rangeVector = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
  rangeVector[col] = max(TRAIN[,col], na.rm = TRUE)-min(TRAIN[,col], na.rm = TRUE)  
}

range_Vector <- data.frame(rangeVector)
row.names(range_Vector) <- colnames(TRAIN)
kable(range_Vector,row.names = TRUE)
rangeVector
CITY 0
CHILD_ETHNICITY 6
CHILD_AGE 4
CHILD_GENDER 1
CHILD_SERBIAN_LANGUAGE 1
MOTHER_AGE 2
MARITAL_STATUS 2
MOTHER_ETHNICITY 998
MOTHER_SERBIAN_LANGUAGE 1
NUMBER_OF_CHILDREN 2
BIRTH_ORDER 2
MOTHER_EDUCATION_LEVEL 3
MOTHER_EMPLOYMENT_STATUS 3
QUALITY_OF_HOUSING 2
HOUSING_CONDITIONS 1
HOUSEHOLD_MONTHLY_INCOME 4
BIRTH_WEIGHT 1
BREASTFEEDING 3
BREASTFEEDING_FREQUENCY 998
BREASTFEEDING_DURING_NIGHT 998
BOTTLE_FEEDING 3
INFANT_FORMULAS 1
ADDITIONAL_FOOD_SWEETENING 2
CHILD_FLUORIDE_SUPPLEMENTS 2
CHILD_FLUORIDE_TOOTHPASTE 2
CHILD_ORAL_HYGIENE 2
CHILD_TOOTH_BRUSHING 3
DIARRHEA_DURING_INFANCY 1
MEDICAL_SYRUPS 2
CHILD_FIRST_DENTIST_VISIT 3
SWEETS_DURING_PREGNANCY 2
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 2
ORAL_HEALTH_DURING_PREGNANCY 2
MOTHER_HEALTH_AWARENESS 2
FATHER_HEALTH_AWARENESS 2
ECC 1
iqc = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
  iqc[col] = IQR(TRAIN[,col])   
}

iqr_vector <- data.frame(iqc)
row.names(iqr_vector) <- colnames(TRAIN)
kable(iqr_vector, row.names = TRUE)
iqc
CITY 0
CHILD_ETHNICITY 2
CHILD_AGE 1
CHILD_GENDER 1
CHILD_SERBIAN_LANGUAGE 0
MOTHER_AGE 1
MARITAL_STATUS 0
MOTHER_ETHNICITY 2
MOTHER_SERBIAN_LANGUAGE 1
NUMBER_OF_CHILDREN 1
BIRTH_ORDER 1
MOTHER_EDUCATION_LEVEL 0
MOTHER_EMPLOYMENT_STATUS 1
QUALITY_OF_HOUSING 2
HOUSING_CONDITIONS 0
HOUSEHOLD_MONTHLY_INCOME 1
BIRTH_WEIGHT 0
BREASTFEEDING 2
BREASTFEEDING_FREQUENCY 1
BREASTFEEDING_DURING_NIGHT 0
BOTTLE_FEEDING 1
INFANT_FORMULAS 1
ADDITIONAL_FOOD_SWEETENING 1
CHILD_FLUORIDE_SUPPLEMENTS 1
CHILD_FLUORIDE_TOOTHPASTE 1
CHILD_ORAL_HYGIENE 0
CHILD_TOOTH_BRUSHING 1
DIARRHEA_DURING_INFANCY 0
MEDICAL_SYRUPS 1
CHILD_FIRST_DENTIST_VISIT 2
SWEETS_DURING_PREGNANCY 0
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 1
ORAL_HEALTH_DURING_PREGNANCY 1
MOTHER_HEALTH_AWARENESS 0
FATHER_HEALTH_AWARENESS 0
ECC 1
variance = matrix(0,36,1)

for (col in 2:ncol(TRAIN)) {
  variance[col] = var(TRAIN[,col])      
}
var_vector <- data.frame(variance)
row.names(var_vector) <- colnames(TRAIN)
kable(var_vector, row.names = TRUE)
variance
CITY 0.000000e+00
CHILD_ETHNICITY 2.198762e+00
CHILD_AGE 6.091558e-01
CHILD_GENDER 2.503077e-01
CHILD_SERBIAN_LANGUAGE 1.164516e-01
MOTHER_AGE 5.229774e-01
MARITAL_STATUS 3.084280e-01
MOTHER_ETHNICITY 2.044553e+04
MOTHER_SERBIAN_LANGUAGE 1.968988e-01
NUMBER_OF_CHILDREN 5.697057e-01
BIRTH_ORDER 6.058507e-01
MOTHER_EDUCATION_LEVEL 4.537112e-01
MOTHER_EMPLOYMENT_STATUS 7.414648e-01
QUALITY_OF_HOUSING 8.175521e-01
HOUSING_CONDITIONS 9.071410e-02
HOUSEHOLD_MONTHLY_INCOME 1.194016e+00
BIRTH_WEIGHT 8.392810e-02
BREASTFEEDING 1.058823e+00
BREASTFEEDING_FREQUENCY 1.032018e+05
BREASTFEEDING_DURING_NIGHT 8.356663e+04
BOTTLE_FEEDING 8.254984e-01
INFANT_FORMULAS 2.468268e-01
ADDITIONAL_FOOD_SWEETENING 4.954116e-01
CHILD_FLUORIDE_SUPPLEMENTS 2.752013e-01
CHILD_FLUORIDE_TOOTHPASTE 4.841954e-01
CHILD_ORAL_HYGIENE 2.583243e-01
CHILD_TOOTH_BRUSHING 7.557751e-01
DIARRHEA_DURING_INFANCY 9.071410e-02
MEDICAL_SYRUPS 2.618403e-01
CHILD_FIRST_DENTIST_VISIT 8.653704e-01
SWEETS_DURING_PREGNANCY 2.179600e-01
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 6.160121e-01
ORAL_HEALTH_DURING_PREGNANCY 5.309237e-01
MOTHER_HEALTH_AWARENESS 2.486551e-01
FATHER_HEALTH_AWARENESS 3.035055e-01
ECC 2.096973e-01
CV = matrix(0,36,1)
for (col in 2:ncol(TRAIN)) {
  CV[col] = sd(TRAIN[,col], na.rm=TRUE)/mean(TRAIN[,col], na.rm=TRUE)*100       
}
CV_vector <- data.frame(CV)
row.names(CV_vector) <- colnames(TRAIN)
kable(CV_vector, row.names = TRUE)
CV
CITY 0.00000
CHILD_ETHNICITY 68.41594
CHILD_AGE 24.93794
CHILD_GENDER 33.96975
CHILD_SERBIAN_LANGUAGE 30.09548
MOTHER_AGE 29.79966
MARITAL_STATUS 44.84180
MOTHER_ETHNICITY 624.07043
MOTHER_SERBIAN_LANGUAGE 25.61646
NUMBER_OF_CHILDREN 39.73446
BIRTH_ORDER 46.39128
MOTHER_EDUCATION_LEVEL 22.39024
MOTHER_EMPLOYMENT_STATUS 35.48258
QUALITY_OF_HOUSING 48.89150
HOUSING_CONDITIONS 27.37030
HOUSEHOLD_MONTHLY_INCOME 32.64472
BIRTH_WEIGHT 15.18402
BREASTFEEDING 51.44958
BREASTFEEDING_FREQUENCY 270.01530
BREASTFEEDING_DURING_NIGHT 310.80960
BOTTLE_FEEDING 37.43933
INFANT_FORMULAS 31.74844
ADDITIONAL_FOOD_SWEETENING 30.64140
CHILD_FLUORIDE_SUPPLEMENTS 19.37844
CHILD_FLUORIDE_TOOTHPASTE 49.79225
CHILD_ORAL_HYGIENE 27.05417
CHILD_TOOTH_BRUSHING 40.50203
DIARRHEA_DURING_INFANCY 15.85548
MEDICAL_SYRUPS 21.12212
CHILD_FIRST_DENTIST_VISIT 29.44774
SWEETS_DURING_PREGNANCY 25.18735
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 32.28616
ORAL_HEALTH_DURING_PREGNANCY 40.49911
MOTHER_HEALTH_AWARENESS 24.22320
FATHER_HEALTH_AWARENESS 29.39024
ECC 26.89056
options(knitr.kable.NA = '')
NUM=data.frame(TRAIN[2:36])

# correlations/covariance
kable(cov(NUM))
CHILD_ETHNICITY CHILD_AGE CHILD_GENDER CHILD_SERBIAN_LANGUAGE MOTHER_AGE MARITAL_STATUS MOTHER_ETHNICITY MOTHER_SERBIAN_LANGUAGE NUMBER_OF_CHILDREN BIRTH_ORDER MOTHER_EDUCATION_LEVEL MOTHER_EMPLOYMENT_STATUS QUALITY_OF_HOUSING HOUSING_CONDITIONS HOUSEHOLD_MONTHLY_INCOME BIRTH_WEIGHT BREASTFEEDING BREASTFEEDING_FREQUENCY BREASTFEEDING_DURING_NIGHT BOTTLE_FEEDING INFANT_FORMULAS ADDITIONAL_FOOD_SWEETENING CHILD_FLUORIDE_SUPPLEMENTS CHILD_FLUORIDE_TOOTHPASTE CHILD_ORAL_HYGIENE CHILD_TOOTH_BRUSHING DIARRHEA_DURING_INFANCY MEDICAL_SYRUPS CHILD_FIRST_DENTIST_VISIT SWEETS_DURING_PREGNANCY FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY ORAL_HEALTH_DURING_PREGNANCY MOTHER_HEALTH_AWARENESS FATHER_HEALTH_AWARENESS ECC
CHILD_ETHNICITY 2.1987624 -0.2024718 0.0339826 0.1581695 -0.2397947 0.0481523 52.6828346 -0.1776836 0.2899863 0.3062480 -0.5056081 -0.4834921 0.1135509 0.1806019 -0.7306354 -0.0727647 0.2857143 4.749161e+01 1.3431314 0.0333146 -0.0403115 0.0467107 0.0702331 0.2147076 0.1758553 0.2232868 -0.1511902 0.0760346 0.2295805 -0.0888330 0.0830315 0.3362751 -0.2241307 -0.3318449 -0.1265427
CHILD_AGE -0.2024718 0.6091558 0.0266517 -0.0300447 0.1040751 0.0109525 9.6080834 0.0138708 -0.0079287 -0.0336662 0.0703386 0.0788650 -0.0434056 -0.0256848 0.1480433 0.0119897 -0.0672269 -1.105976e+01 -7.7784009 -0.0303787 0.0440737 -0.0134841 -0.0416828 -0.0391688 -0.0178088 -0.1661334 0.0130797 -0.0340354 -0.1005415 0.0190746 -0.0309237 -0.0452692 0.0427903 0.0415597 0.0344925
CHILD_GENDER 0.0339826 0.0266517 0.2503077 -0.0131500 -0.0639745 0.0086143 -5.6851728 -0.0325235 0.0056608 -0.0151014 -0.0207799 -0.0051510 0.0421047 0.0111459 0.0073837 -0.0025140 -0.0336134 -1.356791e+01 -14.2560740 0.0368658 0.0007208 0.0060124 -0.0164024 0.0087550 -0.0012130 -0.0485215 0.0056608 -0.0199712 -0.0208678 0.0233114 0.0558876 -0.0096867 -0.0110052 0.0091769 0.0444077
CHILD_SERBIAN_LANGUAGE 0.1581695 -0.0300447 -0.0131500 0.1164516 -0.0447769 0.0057487 -2.6184382 -0.0228192 0.0555184 0.0559228 -0.0725537 -0.0615836 0.0160508 0.0159101 -0.0887100 -0.0128336 0.0000000 -3.126877e+00 -3.9717134 0.0014416 -0.0087198 0.0188812 0.0099680 0.0137829 0.0247178 0.0223269 -0.0243135 -0.0148026 0.0626560 0.0154882 0.0008790 0.0354066 -0.0246827 -0.0167364 -0.0272846
MOTHER_AGE -0.2397947 0.1040751 -0.0639745 -0.0447769 0.5229774 -0.0517914 3.3317746 0.0601420 0.0304314 0.0708484 0.1728842 0.1742379 -0.1245209 -0.0724482 0.2713336 0.0562568 -0.0378151 1.662213e-01 6.7569178 -0.0064344 -0.0067860 0.0533561 -0.0467459 -0.0905207 -0.0782497 -0.0585598 0.0514398 0.0121655 -0.0723427 0.0333497 -0.0880595 -0.1114061 0.0967441 0.0790057 0.0096691
MARITAL_STATUS 0.0481523 0.0109525 0.0086143 0.0057487 -0.0517914 0.3084280 3.4160015 -0.0535143 -0.0305721 -0.0194789 -0.0482226 -0.0349847 0.0276713 0.0347737 -0.0789705 0.0052389 0.0294118 1.392198e+01 7.3551387 -0.0223797 -0.0050280 -0.0081221 0.0113217 0.0434584 0.0710770 0.0951795 -0.0221687 -0.0675961 0.0291481 -0.0111459 0.0564502 0.0018811 -0.0644492 -0.0329630 -0.0212897
MOTHER_ETHNICITY 52.6828346 9.6080834 -5.6851728 -2.6184382 3.3317746 3.4160015 20445.5258606 1.2410780 2.3230723 2.7782954 -0.6589255 -1.0085616 -0.9502655 -1.9113076 4.5684575 1.8406174 4.4117647 1.753670e+03 -1927.9782532 -0.5337717 -3.4669667 2.2320945 1.9909637 0.2367533 -1.4686896 -2.8484231 -2.2147428 -0.3912837 1.0518442 -5.4078795 -4.8317218 0.2974052 2.7656728 -10.2085546 -6.4421785
MOTHER_SERBIAN_LANGUAGE -0.1776836 0.0138708 -0.0325235 -0.0228192 0.0601420 -0.0535143 1.2410780 0.1968988 -0.0732218 -0.0446187 0.0694772 0.0937555 -0.0363032 -0.0402236 0.1101930 0.0172638 -0.0084034 6.266358e+00 3.7417461 -0.0196899 0.0216413 -0.0251573 -0.0241377 -0.0611793 -0.0536374 -0.0866707 0.0234169 -0.0040083 -0.0412784 0.0068387 -0.0311698 -0.0414015 0.0409620 0.0166661 -0.0000527
NUMBER_OF_CHILDREN 0.2899863 -0.0079287 0.0056608 0.0555184 0.0304314 -0.0305721 2.3230723 -0.0732218 0.5697057 0.4759151 -0.1504167 -0.2174677 0.0604409 0.0689498 -0.2885095 -0.0260891 -0.0420168 -9.124380e+00 -11.7260469 0.0304314 0.0065399 -0.0036567 0.0166837 0.0064695 0.0591927 0.0483809 -0.0521430 0.0005977 0.1714954 -0.0525825 -0.0069618 0.0973946 -0.1159418 -0.0798847 -0.0383601
BIRTH_ORDER 0.3062480 -0.0336662 -0.0151014 0.0559228 0.0708484 -0.0194789 2.7782954 -0.0446187 0.4759151 0.6058507 -0.1527548 -0.2064625 0.0437045 0.0661018 -0.2657959 0.0038325 -0.0042017 -4.066946e+00 -8.0309061 0.0162266 0.0020745 0.0204810 0.0102844 0.0025491 0.0783904 0.0893956 -0.0408917 -0.0313456 0.1186667 -0.0347737 -0.0244366 0.1156957 -0.1070989 -0.0826272 -0.0666995
MOTHER_EDUCATION_LEVEL -0.5056081 0.0703386 -0.0207799 -0.0725537 0.1728842 -0.0482226 -0.6589255 0.0694772 -0.1504167 -0.1527548 0.4537112 0.2989346 -0.1247846 -0.1016842 0.4634682 0.0427903 -0.0546218 -9.432562e+00 -0.7563728 0.0342288 0.0120601 0.0563271 -0.0521606 -0.1083823 -0.0998207 -0.1188777 0.0890791 -0.0287613 -0.1399916 0.0642558 -0.0750501 -0.1705812 0.1759783 0.1607187 0.0823283
MOTHER_EMPLOYMENT_STATUS -0.4834921 0.0788650 -0.0051510 -0.0615836 0.1742379 -0.0349847 -1.0085616 0.0937555 -0.2174677 -0.2064625 0.2989346 0.7414648 -0.1287226 -0.1102634 0.5192328 0.0646602 0.0126050 8.527566e+00 15.1098590 0.0439858 0.0268275 0.0953729 -0.0593509 -0.1031258 -0.0740480 -0.0837699 0.0766499 0.0037622 -0.1521747 0.0375514 -0.0418410 -0.1366162 0.1765761 0.1630393 0.0852994
QUALITY_OF_HOUSING 0.1135509 -0.0434056 0.0421047 0.0160508 -0.1245209 0.0276713 -0.9502655 -0.0363032 0.0604409 0.0437045 -0.1247846 -0.1287226 0.8175521 0.0488028 -0.2037727 -0.0265286 0.0546218 -7.444974e+00 -11.2382300 -0.0488907 -0.0111986 0.0239267 0.0691431 0.0349144 0.0110580 0.0431595 -0.0277944 0.0303084 0.0198481 0.0324707 0.1282128 0.0914701 -0.0793748 -0.0778102 -0.0365318
HOUSING_CONDITIONS 0.1806019 -0.0256848 0.0111459 0.0159101 -0.0724482 0.0347737 -1.9113076 -0.0402236 0.0689498 0.0661018 -0.1016842 -0.1102634 0.0488028 0.0907141 -0.1358602 -0.0159277 0.0420168 8.008509e-01 -5.0638691 0.0073837 -0.0065399 0.0120601 0.0253331 0.0649590 0.0626560 0.0692662 -0.0444956 0.0036040 0.0427903 -0.0272494 0.0531803 0.0916810 -0.0773355 -0.0671741 -0.0120601
HOUSEHOLD_MONTHLY_INCOME -0.7306354 0.1480433 0.0073837 -0.0887100 0.2713336 -0.0789705 4.5684575 0.1101930 -0.2885095 -0.2657959 0.4634682 0.5192328 -0.2037727 -0.1358602 1.1940157 0.0993284 -0.0588235 -2.405427e+01 -6.8810696 0.0738546 0.0509124 0.1400970 -0.0323125 -0.1596287 -0.0879364 -0.1309026 0.1442636 -0.0381316 -0.2445238 0.0846841 -0.0830667 -0.2072712 0.2484793 0.2118421 0.0951971
BIRTH_WEIGHT -0.0727647 0.0119897 -0.0025140 -0.0128336 0.0562568 0.0052389 1.8406174 0.0172638 -0.0260891 0.0038325 0.0427903 0.0646602 -0.0265286 -0.0159277 0.0993284 0.0839281 0.0000000 -1.783833e+00 0.1058155 0.0058366 0.0059949 0.0022503 0.0065399 -0.0094758 -0.0112162 0.0177385 0.0201294 -0.0113568 0.0020921 0.0158750 -0.0063816 -0.0311698 0.0180198 0.0178088 0.0271615
BREASTFEEDING 0.2857143 -0.0672269 -0.0336134 0.0000000 -0.0378151 0.0294118 4.4117647 -0.0084034 -0.0420168 -0.0042017 -0.0546218 0.0126050 0.0546218 0.0420168 -0.0588235 0.0000000 1.0588235 2.220378e+02 184.4873950 -0.1722689 0.0210084 0.0672269 0.0294118 0.0714286 0.0000000 0.1470588 -0.0420168 -0.0042017 -0.0126050 -0.0126050 0.0714286 0.1176471 -0.0588235 -0.0630252 -0.0252101
BREASTFEEDING_FREQUENCY 47.4916142 -11.0597553 -13.5679125 -3.1268767 0.1662213 13.9219788 1753.6700538 6.2663584 -9.1243803 -4.0669456 -9.4325621 8.5275658 -7.4449738 0.8008509 -24.0542702 -1.7838332 222.0378151 1.032018e+05 81188.4077740 -158.8968039 -41.0781970 2.8436236 -7.5073837 20.4049787 -15.0492775 24.7810028 -0.8092542 0.7039309 27.4577898 -12.1717591 -4.3966984 11.0369537 -15.2968426 -2.0703913 -2.9024472
BREASTFEEDING_DURING_NIGHT 1.3431314 -7.7784009 -14.2560740 -3.9717134 6.7569178 7.3551387 -1927.9782532 3.7417461 -11.7260469 -8.0309061 -0.7563728 15.1098590 -11.2382300 -5.0638691 -6.8810696 0.1058155 184.4873950 8.118841e+04 83566.6301818 -127.4657712 -31.1560072 1.9344784 -2.3336732 9.5008614 -18.1670476 20.0365845 0.8831968 7.1224992 10.4734538 0.9171970 -6.2053022 -2.4352871 -9.5845259 3.2027355 -1.9302767
BOTTLE_FEEDING 0.0333146 -0.0303787 0.0368658 0.0014416 -0.0064344 -0.0223797 -0.5337717 -0.0196899 0.0304314 0.0162266 0.0342288 0.0439858 -0.0488907 0.0073837 0.0738546 0.0058366 -0.1722689 -1.588968e+02 -127.4657712 0.8254984 0.1780880 0.0575578 0.0162793 -0.0485039 0.0646074 -0.0249464 0.0136247 -0.0256496 -0.0135192 0.0165430 0.0169825 -0.0105657 0.0085088 -0.0302380 -0.0365493
INFANT_FORMULAS -0.0403115 0.0440737 0.0007208 -0.0087198 -0.0067860 -0.0050280 -3.4669667 0.0216413 0.0065399 0.0020745 0.0120601 0.0268275 -0.0111986 -0.0065399 0.0509124 0.0059949 0.0210084 -4.107820e+01 -31.1560072 0.1780880 0.2468268 -0.0088429 0.0484863 -0.0069794 0.0184065 0.0051686 0.0233466 -0.0338244 0.0022503 0.0074364 0.0412609 -0.0121304 -0.0164200 -0.0170353 0.0130445
ADDITIONAL_FOOD_SWEETENING 0.0467107 -0.0134841 0.0060124 0.0188812 0.0533561 -0.0081221 2.2320945 -0.0251573 -0.0036567 0.0204810 0.0563271 0.0953729 0.0239267 0.0120601 0.1400970 0.0022503 0.0672269 2.843624e+00 1.9344784 0.0575578 -0.0088429 0.4954116 0.0159453 -0.0219402 0.0025843 -0.0142752 0.0173517 -0.0252277 -0.0642382 0.0310819 -0.0025140 -0.0451285 0.0665588 0.0332443 0.0340002
CHILD_FLUORIDE_SUPPLEMENTS 0.0702331 -0.0416828 -0.0164024 0.0099680 -0.0467459 0.0113217 1.9909637 -0.0241377 0.0166837 0.0102844 -0.0521606 -0.0593509 0.0691431 0.0253331 -0.0323125 0.0065399 0.0294118 -7.507384e+00 -2.3336732 0.0162793 0.0484863 0.0159453 0.2752013 0.0622868 0.0399423 0.0052565 -0.0043247 0.0150487 0.1223937 -0.0262649 0.1015435 0.0585774 -0.0878134 -0.0369185 -0.0075419
CHILD_FLUORIDE_TOOTHPASTE 0.2147076 -0.0391688 0.0087550 0.0137829 -0.0905207 0.0434584 0.2367533 -0.0611793 0.0064695 0.0025491 -0.1083823 -0.1031258 0.0349144 0.0649590 -0.1596287 -0.0094758 0.0714286 2.040498e+01 9.5008614 -0.0485039 -0.0069794 -0.0219402 0.0622868 0.4841954 0.0316269 0.1054112 -0.0355473 0.0119897 0.0457790 -0.0255793 0.0926831 0.0801660 -0.0948103 -0.0633417 -0.0116733
CHILD_ORAL_HYGIENE 0.1758553 -0.0178088 -0.0012130 0.0247178 -0.0782497 0.0710770 -1.4686896 -0.0536374 0.0591927 0.0783904 -0.0998207 -0.0740480 0.0110580 0.0626560 -0.0879364 -0.0112162 0.0000000 -1.504928e+01 -18.1670476 0.0646074 0.0184065 0.0025843 0.0399423 0.0316269 0.2583243 0.1817095 -0.0374459 -0.0157343 0.0823986 -0.0388524 0.0735206 0.1057804 -0.0895011 -0.0573116 -0.0319961
CHILD_TOOTH_BRUSHING 0.2232868 -0.1661334 -0.0485215 0.0223269 -0.0585598 0.0951795 -2.8484231 -0.0866707 0.0483809 0.0893956 -0.1188777 -0.0837699 0.0431595 0.0692662 -0.1309026 0.0177385 0.1470588 2.478100e+01 20.0365845 -0.0249464 0.0051686 -0.0142752 0.0052565 0.1054112 0.1817095 0.7557751 -0.0608628 0.0428958 0.1698956 -0.0288844 0.0710770 0.0841567 -0.0800429 -0.0655743 -0.0319433
DIARRHEA_DURING_INFANCY -0.1511902 0.0130797 0.0056608 -0.0243135 0.0514398 -0.0221687 -2.2147428 0.0234169 -0.0521430 -0.0408917 0.0890791 0.0766499 -0.0277944 -0.0444956 0.1442636 0.0201294 -0.0420168 -8.092542e-01 0.8831968 0.0136247 0.0233466 0.0173517 -0.0043247 -0.0355473 -0.0374459 -0.0608628 0.0907141 -0.0204107 -0.0680004 0.0188460 -0.0321719 -0.0832777 0.0479238 0.0461657 0.0162617
MEDICAL_SYRUPS 0.0760346 -0.0340354 -0.0199712 -0.0148026 0.0121655 -0.0675961 -0.3912837 -0.0040083 0.0005977 -0.0313456 -0.0287613 0.0037622 0.0303084 0.0036040 -0.0381316 -0.0113568 -0.0042017 7.039309e-01 7.1224992 -0.0256496 -0.0338244 -0.0252277 0.0150487 0.0119897 -0.0157343 0.0428958 -0.0204107 0.2618403 -0.0044478 -0.0176857 -0.0022151 0.0138005 -0.0038501 -0.0517738 -0.0335959
CHILD_FIRST_DENTIST_VISIT 0.2295805 -0.1005415 -0.0208678 0.0626560 -0.0723427 0.0291481 1.0518442 -0.0412784 0.1714954 0.1186667 -0.1399916 -0.1521747 0.0198481 0.0427903 -0.2445238 0.0020921 -0.0126050 2.745779e+01 10.4734538 -0.0135192 0.0022503 -0.0642382 0.1223937 0.0457790 0.0823986 0.1698956 -0.0680004 -0.0044478 0.8653704 -0.0690552 0.0110228 0.0992933 -0.1185964 -0.0850005 0.0096164
SWEETS_DURING_PREGNANCY -0.0888330 0.0190746 0.0233114 0.0154882 0.0333497 -0.0111459 -5.4078795 0.0068387 -0.0525825 -0.0347737 0.0642558 0.0375514 0.0324707 -0.0272494 0.0846841 0.0158750 -0.0126050 -1.217176e+01 0.9171970 0.0165430 0.0074364 0.0310819 -0.0262649 -0.0255793 -0.0388524 -0.0288844 0.0188460 -0.0176857 -0.0690552 0.2179600 -0.0248585 -0.0337365 0.0548328 0.0403643 0.0067332
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 0.0830315 -0.0309237 0.0558876 0.0008790 -0.0880595 0.0564502 -4.8317218 -0.0311698 -0.0069618 -0.0244366 -0.0750501 -0.0418410 0.1282128 0.0531803 -0.0830667 -0.0063816 0.0714286 -4.396698e+00 -6.2053022 0.0169825 0.0412609 -0.0025140 0.1015435 0.0926831 0.0735206 0.0710770 -0.0321719 -0.0022151 0.0110228 -0.0248585 0.6160121 0.0617067 -0.1429978 -0.0801308 -0.0058894
ORAL_HEALTH_DURING_PREGNANCY 0.3362751 -0.0452692 -0.0096867 0.0354066 -0.1114061 0.0018811 0.2974052 -0.0414015 0.0973946 0.1156957 -0.1705812 -0.1366162 0.0914701 0.0916810 -0.2072712 -0.0311698 0.1176471 1.103695e+01 -2.4352871 -0.0105657 -0.0121304 -0.0451285 0.0585774 0.0801660 0.1057804 0.0841567 -0.0832777 0.0138005 0.0992933 -0.0337365 0.0617067 0.5309237 -0.1268415 -0.1219542 -0.0431068
MOTHER_HEALTH_AWARENESS -0.2241307 0.0427903 -0.0110052 -0.0246827 0.0967441 -0.0644492 2.7656728 0.0409620 -0.1159418 -0.1070989 0.1759783 0.1765761 -0.0793748 -0.0773355 0.2484793 0.0180198 -0.0588235 -1.529684e+01 -9.5845259 0.0085088 -0.0164200 0.0665588 -0.0878134 -0.0948103 -0.0895011 -0.0800429 0.0479238 -0.0038501 -0.1185964 0.0548328 -0.1429978 -0.1268415 0.2486551 0.1208291 0.0468865
FATHER_HEALTH_AWARENESS -0.3318449 0.0415597 0.0091769 -0.0167364 0.0790057 -0.0329630 -10.2085546 0.0166661 -0.0798847 -0.0826272 0.1607187 0.1630393 -0.0778102 -0.0671741 0.2118421 0.0178088 -0.0630252 -2.070391e+00 3.2027355 -0.0302380 -0.0170353 0.0332443 -0.0369185 -0.0633417 -0.0573116 -0.0655743 0.0461657 -0.0517738 -0.0850005 0.0403643 -0.0801308 -0.1219542 0.1208291 0.3035055 0.0591927
ECC -0.1265427 0.0344925 0.0444077 -0.0272846 0.0096691 -0.0212897 -6.4421785 -0.0000527 -0.0383601 -0.0666995 0.0823283 0.0852994 -0.0365318 -0.0120601 0.0951971 0.0271615 -0.0252101 -2.902447e+00 -1.9302767 -0.0365493 0.0130445 0.0340002 -0.0075419 -0.0116733 -0.0319961 -0.0319433 0.0162617 -0.0335959 0.0096164 0.0067332 -0.0058894 -0.0431068 0.0468865 0.0591927 0.2096973
kable(cor(NUM))
CHILD_ETHNICITY CHILD_AGE CHILD_GENDER CHILD_SERBIAN_LANGUAGE MOTHER_AGE MARITAL_STATUS MOTHER_ETHNICITY MOTHER_SERBIAN_LANGUAGE NUMBER_OF_CHILDREN BIRTH_ORDER MOTHER_EDUCATION_LEVEL MOTHER_EMPLOYMENT_STATUS QUALITY_OF_HOUSING HOUSING_CONDITIONS HOUSEHOLD_MONTHLY_INCOME BIRTH_WEIGHT BREASTFEEDING BREASTFEEDING_FREQUENCY BREASTFEEDING_DURING_NIGHT BOTTLE_FEEDING INFANT_FORMULAS ADDITIONAL_FOOD_SWEETENING CHILD_FLUORIDE_SUPPLEMENTS CHILD_FLUORIDE_TOOTHPASTE CHILD_ORAL_HYGIENE CHILD_TOOTH_BRUSHING DIARRHEA_DURING_INFANCY MEDICAL_SYRUPS CHILD_FIRST_DENTIST_VISIT SWEETS_DURING_PREGNANCY FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY ORAL_HEALTH_DURING_PREGNANCY MOTHER_HEALTH_AWARENESS FATHER_HEALTH_AWARENESS ECC
CHILD_ETHNICITY 1.0000000 -0.1749489 0.0458069 0.3125799 -0.2236191 0.0584724 0.2484739 -0.2700453 0.2590974 0.2653392 -0.5062151 -0.3786649 0.0846922 0.4043858 -0.4509273 -0.1693861 0.1872540 0.0996975 0.0031334 0.0247279 -0.0547197 0.0447553 0.0902875 0.2080885 0.2333370 0.1732119 -0.3385299 0.1002083 0.1664351 -0.1283208 0.0713443 0.3112358 -0.3031192 -0.4062213 -0.1863595
CHILD_AGE -0.1749489 1.0000000 0.0682532 -0.1128055 0.1843916 0.0252681 0.0860941 0.0400513 -0.0134590 -0.0554175 0.1337950 0.1173478 -0.0615070 -0.1092632 0.1735880 0.0530263 -0.0837080 -0.0441101 -0.0344754 -0.0428397 0.1136630 -0.0245456 -0.1018046 -0.0721217 -0.0448939 -0.2448479 0.0556412 -0.0852213 -0.1384778 0.0523483 -0.0504815 -0.0796017 0.1099469 0.0966552 0.0965081
CHILD_GENDER 0.0458069 0.0682532 1.0000000 -0.0770224 -0.1768189 0.0310033 -0.0794708 -0.1465002 0.0149906 -0.0387792 -0.0616617 -0.0119567 0.0930756 0.0739673 0.0135062 -0.0173448 -0.0652926 -0.0844175 -0.0985704 0.0811014 0.0028999 0.0170738 -0.0624949 0.0251482 -0.0047704 -0.1115580 0.0375670 -0.0780096 -0.0448371 0.0998029 0.1423259 -0.0265720 -0.0441127 0.0332947 0.1938318
CHILD_SERBIAN_LANGUAGE 0.3125799 -0.1128055 -0.0770224 1.0000000 -0.1814429 0.0303336 -0.0536624 -0.1506973 0.2155456 0.2105393 -0.3156437 -0.2095788 0.0520194 0.1547974 -0.2379001 -0.1298140 0.0000000 -0.0285229 -0.0402614 0.0046495 -0.0514325 0.0786092 0.0556814 0.0580441 0.1425132 0.0752592 -0.2365577 -0.0847708 0.1973736 0.0972165 0.0032819 0.1423954 -0.1450510 -0.0890238 -0.1746014
MOTHER_AGE -0.2236191 0.1843916 -0.1768189 -0.1814429 1.0000000 -0.1289554 0.0322207 0.1874197 0.0557514 0.1258653 0.3549148 0.2798053 -0.1904334 -0.3326204 0.3433659 0.2685220 -0.0508174 0.0007155 0.0323214 -0.0097928 -0.0188875 0.1048237 -0.1232187 -0.1798855 -0.2128917 -0.0931455 0.2361677 0.0328754 -0.1075357 0.0987785 -0.1551458 -0.2114226 0.2682777 0.1983049 0.0291978
MARITAL_STATUS 0.0584724 0.0252681 0.0310033 0.0303336 -0.1289554 1.0000000 0.0430172 -0.2171557 -0.0729327 -0.0450615 -0.1289093 -0.0731570 0.0551055 0.2078917 -0.1301317 0.0325620 0.0514674 0.0780334 0.0458139 -0.0443525 -0.0182229 -0.0207782 0.0388605 0.1124570 0.2518079 0.1971379 -0.1325336 -0.2378627 0.0564198 -0.0429882 0.1295072 0.0046485 -0.2327245 -0.1077373 -0.0837136
MOTHER_ETHNICITY 0.2484739 0.0860941 -0.0794708 -0.0536624 0.0322207 0.0430172 1.0000000 0.0195604 0.0215248 0.0249630 -0.0068414 -0.0081914 -0.0073500 -0.0443807 0.0292392 0.0444335 0.0299848 0.0381773 -0.0466430 -0.0041086 -0.0488039 0.0221784 0.0265423 0.0023795 -0.0202092 -0.0229144 -0.0514265 -0.0053478 0.0079077 -0.0810102 -0.0430535 0.0028545 0.0387885 -0.1295931 -0.0983869
MOTHER_SERBIAN_LANGUAGE -0.2700453 0.0400513 -0.1465002 -0.1506973 0.1874197 -0.2171557 0.0195604 1.0000000 -0.2186217 -0.1291851 0.2324506 0.2453748 -0.0904828 -0.3009693 0.2272624 0.1342954 -0.0184043 0.0439592 0.0291700 -0.0488386 0.0981670 -0.0805490 -0.1036929 -0.1981402 -0.2378281 -0.2246747 0.1752146 -0.0176531 -0.1000001 0.0330115 -0.0894989 -0.1280497 0.1851232 0.0681755 -0.0002596
NUMBER_OF_CHILDREN 0.2590974 -0.0134590 0.0149906 0.2155456 0.0557514 -0.0729327 0.0215248 -0.2186217 1.0000000 0.8100678 -0.2958563 -0.3345987 0.0885621 0.3032983 -0.3498081 -0.1193109 -0.0540986 -0.0376300 -0.0537415 0.0443750 0.0174400 -0.0068830 0.0421348 0.0123179 0.1542980 0.0737313 -0.2293684 0.0015476 0.2442452 -0.1492203 -0.0117517 0.1770898 -0.3080463 -0.1921122 -0.1109834
BIRTH_ORDER 0.2653392 -0.0554175 -0.0387792 0.2105393 0.1258653 -0.0450615 0.0249630 -0.1291851 0.8100678 1.0000000 -0.2913549 -0.3080443 0.0620992 0.2819634 -0.3125075 0.0169959 -0.0052460 -0.0162645 -0.0356915 0.0229449 0.0053645 0.0373840 0.0251868 0.0047065 0.1981514 0.1321104 -0.1744274 -0.0787001 0.1638872 -0.0956930 -0.0400002 0.2039944 -0.2759329 -0.1926890 -0.1871299
MOTHER_EDUCATION_LEVEL -0.5062151 0.1337950 -0.0616617 -0.3156437 0.3549148 -0.1289093 -0.0068414 0.2324506 -0.2958563 -0.2913549 1.0000000 0.5153962 -0.2048867 -0.5012175 0.6296877 0.2192816 -0.0788070 -0.0435909 -0.0038845 0.0559298 0.0360382 0.1188078 -0.1476141 -0.2312374 -0.2915736 -0.2030085 0.4390853 -0.0834450 -0.2234144 0.2043311 -0.1419603 -0.3475565 0.5239270 0.4331051 0.2669090
MOTHER_EMPLOYMENT_STATUS -0.3786649 0.1173478 -0.0119567 -0.2095788 0.2798053 -0.0731570 -0.0081914 0.2453748 -0.3345987 -0.3080443 0.5153962 1.0000000 -0.1653301 -0.4251562 0.5518383 0.2592017 0.0142261 0.0308273 0.0607014 0.0562224 0.0627102 0.1573608 -0.1313884 -0.1721122 -0.1691943 -0.1119042 0.2955486 0.0085384 -0.1899748 0.0934099 -0.0619102 -0.2177413 0.4112329 0.3436875 0.2163238
QUALITY_OF_HOUSING 0.0846922 -0.0615070 0.0930756 0.0520194 -0.1904334 0.0551055 -0.0073500 -0.0904828 0.0885621 0.0620992 -0.2048867 -0.1653301 1.0000000 0.1792047 -0.2062449 -0.1012751 0.0587079 -0.0256308 -0.0429956 -0.0595128 -0.0249293 0.0375961 0.1457693 0.0554928 0.0240622 0.0549064 -0.1020615 0.0655068 0.0235972 0.0769212 0.1806671 0.1388370 -0.1760461 -0.1562052 -0.0882301
HOUSING_CONDITIONS 0.4043858 -0.1092632 0.0739673 0.1547974 -0.3326204 0.2078917 -0.0443807 -0.3009693 0.3032983 0.2819634 -0.5012175 -0.4251562 0.1792047 1.0000000 -0.4128096 -0.1825417 0.1355732 0.0082770 -0.0581606 0.0269823 -0.0437053 0.0568891 0.1603343 0.3099502 0.4093010 0.2645377 -0.4905039 0.0233842 0.1527240 -0.1937899 0.2249668 0.4177592 -0.5149238 -0.4048382 -0.0874411
HOUSEHOLD_MONTHLY_INCOME -0.4509273 0.1735880 0.0135062 -0.2379001 0.3433659 -0.1301317 0.0292392 0.2272624 -0.3498081 -0.3125075 0.6296877 0.5518383 -0.2062449 -0.4128096 1.0000000 0.3137724 -0.0523160 -0.0685241 -0.0217838 0.0743900 0.0937827 0.1821549 -0.0563690 -0.2099402 -0.1583366 -0.1377993 0.4383431 -0.0681964 -0.2405553 0.1660001 -0.0968562 -0.2603262 0.4560228 0.3519037 0.1902489
BIRTH_WEIGHT -0.1693861 0.0530263 -0.0173448 -0.1298140 0.2685220 0.0325620 0.0444335 0.1342954 -0.1193109 0.0169959 0.2192816 0.2592017 -0.1012751 -0.1825417 0.3137724 1.0000000 0.0000000 -0.0191671 0.0012635 0.0221744 0.0416514 0.0110357 0.0430318 -0.0470056 -0.0761745 0.0704314 0.2306957 -0.0766100 0.0077628 0.1173737 -0.0280662 -0.1476604 0.1247374 0.1115829 0.2047404
BREASTFEEDING 0.1872540 -0.0837080 -0.0652926 0.0000000 -0.0508174 0.0514674 0.0299848 -0.0184043 -0.0540986 -0.0052460 -0.0788070 0.0142261 0.0587079 0.1355732 -0.0523160 0.0000000 1.0000000 0.6716940 0.6202096 -0.1842625 0.0410946 0.0928214 0.0544859 0.0997585 0.0000000 0.1643929 -0.1355732 -0.0079798 -0.0131684 -0.0262388 0.0884434 0.1569109 -0.1146412 -0.1111781 -0.0535015
BREASTFEEDING_FREQUENCY 0.0996975 -0.0441101 -0.0844175 -0.0285229 0.0007155 0.0780334 0.0381773 0.0439592 -0.0376300 -0.0162645 -0.0435909 0.0308273 -0.0256308 0.0082770 -0.0685241 -0.0191671 0.6716940 1.0000000 0.8742465 -0.5443940 -0.2573781 0.0125761 -0.0445471 0.0912814 -0.0921700 0.0887317 -0.0083638 0.0042822 0.0918800 -0.0811561 -0.0174377 0.0471508 -0.0954903 -0.0116984 -0.0197299
BREASTFEEDING_DURING_NIGHT 0.0031334 -0.0344754 -0.0985704 -0.0402614 0.0323214 0.0458139 -0.0466430 0.0291700 -0.0537415 -0.0356915 -0.0038845 0.0607014 -0.0429956 -0.0581606 -0.0217838 0.0012635 0.6202096 0.8742465 1.0000000 -0.4853097 -0.2169348 0.0095075 -0.0153886 0.0472320 -0.1236475 0.0797280 0.0101439 0.0481502 0.0389469 0.0067961 -0.0273497 -0.0115616 -0.0664899 0.0201104 -0.0145817
BOTTLE_FEEDING 0.0247279 -0.0428397 0.0811014 0.0046495 -0.0097928 -0.0443525 -0.0041086 -0.0488386 0.0443750 0.0229449 0.0559298 0.0562224 -0.0595128 0.0269823 0.0743900 0.0221744 -0.1842625 -0.5443940 -0.4853097 1.0000000 0.3945303 0.0900042 0.0341549 -0.0767200 0.1399078 -0.0315830 0.0497888 -0.0551701 -0.0159953 0.0390003 0.0238149 -0.0159597 0.0187808 -0.0604105 -0.0878466
INFANT_FORMULAS -0.0547197 0.1136630 0.0028999 -0.0514325 -0.0188875 -0.0182229 -0.0488039 0.0981670 0.0174400 0.0053645 0.0360382 0.0627102 -0.0249293 -0.0437053 0.0937827 0.0416514 0.0410946 -0.2573781 -0.2169348 0.3945303 1.0000000 -0.0252880 0.1860364 -0.0201887 0.0728942 0.0119669 0.1560234 -0.1330503 0.0048690 0.0320613 0.1058151 -0.0335090 -0.0662792 -0.0622400 0.0573372
ADDITIONAL_FOOD_SWEETENING 0.0447553 -0.0245456 0.0170738 0.0786092 0.1048237 -0.0207782 0.0221784 -0.0805490 -0.0068830 0.0373840 0.1188078 0.1573608 0.0375961 0.0568891 0.1821549 0.0110357 0.0928214 0.0125761 0.0095075 0.0900042 -0.0252880 1.0000000 0.0431841 -0.0447967 0.0072240 -0.0233293 0.0818506 -0.0700448 -0.0981092 0.0945880 -0.0045508 -0.0879938 0.1896374 0.0857335 0.1054878
CHILD_FLUORIDE_SUPPLEMENTS 0.0902875 -0.1018046 -0.0624949 0.0556814 -0.1232187 0.0388605 0.0265423 -0.1036929 0.0421348 0.0251868 -0.1476141 -0.1313884 0.1457693 0.1603343 -0.0563690 0.0430318 0.0544859 -0.0445471 -0.0153886 0.0341549 0.1860364 0.0431841 1.0000000 0.1706321 0.1498048 0.0115259 -0.0273714 0.0560603 0.2508031 -0.1072413 0.2466224 0.1532459 -0.3356887 -0.1277426 -0.0313950
CHILD_FLUORIDE_TOOTHPASTE 0.2080885 -0.0721217 0.0251482 0.0580441 -0.1798855 0.1124570 0.0023795 -0.1981402 0.0123179 0.0047065 -0.2312374 -0.1721122 0.0554928 0.3099502 -0.2099402 -0.0470056 0.0997585 0.0912814 0.0472320 -0.0767200 -0.0201887 -0.0447967 0.1706321 1.0000000 0.0894259 0.1742530 -0.1696128 0.0336729 0.0707220 -0.0787389 0.1697054 0.1581116 -0.2732414 -0.1652326 -0.0366342
CHILD_ORAL_HYGIENE 0.2333370 -0.0448939 -0.0047704 0.1425132 -0.2128917 0.2518079 -0.0202092 -0.2378281 0.1542980 0.1981514 -0.2915736 -0.1691943 0.0240622 0.4093010 -0.1583366 -0.0761745 0.0000000 -0.0921700 -0.1236475 0.1399078 0.0728942 0.0072240 0.1498048 0.0894259 1.0000000 0.4112432 -0.2446159 -0.0604989 0.1742756 -0.1637369 0.1843028 0.2856318 -0.3531400 -0.2046807 -0.1374730
CHILD_TOOTH_BRUSHING 0.1732119 -0.2448479 -0.1115580 0.0752592 -0.0931455 0.1971379 -0.0229144 -0.2246747 0.0737313 0.1321104 -0.2030085 -0.1119042 0.0549064 0.2645377 -0.1377993 0.0704314 0.1643929 0.0887317 0.0797280 -0.0315830 0.0119669 -0.0233293 0.0115259 0.1742530 0.4112432 1.0000000 -0.2324441 0.0964274 0.2100800 -0.0711669 0.1041689 0.1328545 -0.1846409 -0.1369161 -0.0802393
DIARRHEA_DURING_INFANCY -0.3385299 0.0556412 0.0375670 -0.2365577 0.2361677 -0.1325336 -0.0514265 0.1752146 -0.2293684 -0.1744274 0.4390853 0.2955486 -0.1020615 -0.4905039 0.4383431 0.2306957 -0.1355732 -0.0083638 0.0101439 0.0497888 0.1560234 0.0818506 -0.0273714 -0.1696128 -0.2446159 -0.2324441 1.0000000 -0.1324347 -0.2427019 0.1340276 -0.1360956 -0.3794679 0.3190912 0.2782269 0.1179052
MEDICAL_SYRUPS 0.1002083 -0.0852213 -0.0780096 -0.0847708 0.0328754 -0.2378627 -0.0053478 -0.0176531 0.0015476 -0.0787001 -0.0834450 0.0085384 0.0655068 0.0233842 -0.0681964 -0.0766100 -0.0079798 0.0042822 0.0481502 -0.0551701 -0.1330503 -0.0700448 0.0560603 0.0336729 -0.0604989 0.0964274 -0.1324347 1.0000000 -0.0093439 -0.0740315 -0.0055155 0.0370135 -0.0150887 -0.1836576 -0.1433743
CHILD_FIRST_DENTIST_VISIT 0.1664351 -0.1384778 -0.0448371 0.1973736 -0.1075357 0.0564198 0.0079077 -0.1000001 0.2442452 0.1638872 -0.2234144 -0.1899748 0.0235972 0.1527240 -0.2405553 0.0077628 -0.0131684 0.0918800 0.0389469 -0.0159953 0.0048690 -0.0981092 0.2508031 0.0707220 0.1742756 0.2100800 -0.2427019 -0.0093439 1.0000000 -0.1590037 0.0150972 0.1464882 -0.2556653 -0.1658583 0.0225743
SWEETS_DURING_PREGNANCY -0.1283208 0.0523483 0.0998029 0.0972165 0.0987785 -0.0429882 -0.0810102 0.0330115 -0.1492203 -0.0956930 0.2043311 0.0934099 0.0769212 -0.1937899 0.1660001 0.1173737 -0.0262388 -0.0811561 0.0067961 0.0390003 0.0320613 0.0945880 -0.1072413 -0.0787389 -0.1637369 -0.0711669 0.1340276 -0.0740315 -0.1590037 1.0000000 -0.0678409 -0.0991735 0.2355339 0.1569370 0.0314948
FLUORIDE_SUPPLEMENTS_DURING_PREGNANCY 0.0713443 -0.0504815 0.1423259 0.0032819 -0.1551458 0.1295072 -0.0430535 -0.0894989 -0.0117517 -0.0400002 -0.1419603 -0.0619102 0.1806671 0.2249668 -0.0968562 -0.0280662 0.0884434 -0.0174377 -0.0273497 0.0238149 0.1058151 -0.0045508 0.2466224 0.1697054 0.1843028 0.1041689 -0.1360956 -0.0055155 0.0150972 -0.0678409 1.0000000 0.1079000 -0.3653726 -0.1853197 -0.0163862
ORAL_HEALTH_DURING_PREGNANCY 0.3112358 -0.0796017 -0.0265720 0.1423954 -0.2114226 0.0046485 0.0028545 -0.1280497 0.1770898 0.2039944 -0.3475565 -0.2177413 0.1388370 0.4177592 -0.2603262 -0.1476604 0.1569109 0.0471508 -0.0115616 -0.0159597 -0.0335090 -0.0879938 0.1532459 0.1581116 0.2856318 0.1328545 -0.3794679 0.0370135 0.1464882 -0.0991735 0.1079000 1.0000000 -0.3490975 -0.3038068 -0.1291913
MOTHER_HEALTH_AWARENESS -0.3031192 0.1099469 -0.0441127 -0.1450510 0.2682777 -0.2327245 0.0387885 0.1851232 -0.3080463 -0.2759329 0.5239270 0.4112329 -0.1760461 -0.5149238 0.4560228 0.1247374 -0.1146412 -0.0954903 -0.0664899 0.0187808 -0.0662792 0.1896374 -0.3356887 -0.2732414 -0.3531400 -0.1846409 0.3190912 -0.0150887 -0.2556653 0.2355339 -0.3653726 -0.3490975 1.0000000 0.4398347 0.2053303
FATHER_HEALTH_AWARENESS -0.4062213 0.0966552 0.0332947 -0.0890238 0.1983049 -0.1077373 -0.1295931 0.0681755 -0.1921122 -0.1926890 0.4331051 0.3436875 -0.1562052 -0.4048382 0.3519037 0.1115829 -0.1111781 -0.0116984 0.0201104 -0.0604105 -0.0622400 0.0857335 -0.1277426 -0.1652326 -0.2046807 -0.1369161 0.2782269 -0.1836576 -0.1658583 0.1569370 -0.1853197 -0.3038068 0.4398347 1.0000000 0.2346327
ECC -0.1863595 0.0965081 0.1938318 -0.1746014 0.0291978 -0.0837136 -0.0983869 -0.0002596 -0.1109834 -0.1871299 0.2669090 0.2163238 -0.0882301 -0.0874411 0.1902489 0.2047404 -0.0535015 -0.0197299 -0.0145817 -0.0878466 0.0573372 0.1054878 -0.0313950 -0.0366342 -0.1374730 -0.0802393 0.1179052 -0.1433743 0.0225743 0.0314948 -0.0163862 -0.1291913 0.2053303 0.2346327 1.0000000

To be able to have an idea about the outliers, we should plot boxplots of the numerical attributes.

for (col in 2:ncol(TRAIN)) {
  boxplot(TRAIN[,col],main=paste("Boxplot of the",colnames(TRAIN)[col] ))
}


3. Classification Methods Implemented:

library(ade4)
library(data.table)

#COMBINE ALL DATA TO HAVE CONSISTENT 
ALL_DATA <- rbind(TRAIN, VALIDATION, TEST)
ALL_DATA_x <- ALL_DATA[,1:35]
ALL_DATA_y <- ALL_DATA[36]

#APPLY ONE HOT METHOD TO CATEGORICAL AND NULL(999) INVOLVING FEATURES
col_names <- c("CITY", "CHILD_ETHNICITY", "MOTHER_ETHNICITY", "BREASTFEEDING_FREQUENCY", "BREASTFEEDING_DURING_NIGHT", "MOTHER_EMPLOYMENT_STATUS")
for (f in col_names){
  df_all_dummy = acm.disjonctif(ALL_DATA_x[f])
  ALL_DATA_x[f] = NULL
  ALL_DATA_x = cbind(ALL_DATA_x, df_all_dummy)
}

#DELETE .999 FEATURES
col_names999 <- c("MOTHER_ETHNICITY.999", "BREASTFEEDING_FREQUENCY.999", "BREASTFEEDING_DURING_NIGHT.999")
for (f in col_names999){
  ALL_DATA_x[f] = NULL
}

#NORMALIZATION FUNCTION
normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

#APPLY NORMALIZATION
ALL_DATA_x <- as.data.frame(lapply(ALL_DATA_x, normalize))

3.1. Association Rule Mining (implemented on the paper)

col_names <- colnames(TRAIN)
TRAIN_factor <- as.data.frame(lapply(TRAIN[,col_names], factor))

rules1 <- apriori(TRAIN_factor, appearance = list(rhs=c("ECC=1"), default="lhs"), parameter = list(minlen=2, maxlen=7, sup = 0.1, conf = 0.4, target="rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5     0.1      2
##  maxlen target   ext
##       7  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 23 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[125 item(s), 239 transaction(s)] done [0.00s].
## sorting and recoding items ... [93 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7
## Warning in apriori(TRAIN_factor, appearance = list(rhs = c("ECC=1"),
## default = "lhs"), : Mining stopped (maxlen reached). Only patterns up to a
## length of 7 returned!
##  done [1.50s].
## writing ... [125 rule(s)] done [0.08s].
## creating S4 object  ... done [0.10s].
rules1<-sort(rules1, decreasing=TRUE, by="confidence")
#inspect(rules1)

rules2 <- apriori(TRAIN_factor, appearance = list(rhs=c("ECC=2"), default="lhs"), parameter = list(minlen=2, maxlen=7, sup = 0.3, conf = 0.8, target="rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.3      2
##  maxlen target   ext
##       7  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 71 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[125 item(s), 239 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7
## Warning in apriori(TRAIN_factor, appearance = list(rhs = c("ECC=2"),
## default = "lhs"), : Mining stopped (maxlen reached). Only patterns up to a
## length of 7 returned!
##  done [0.03s].
## writing ... [246 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules2<-sort(rules2, decreasing=TRUE, by="confidence")
#inspect(rules2)

Justification for Model Parameters

3.2. SVM

#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]

TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)

#POSSIBLE COST AND GAMMA VALUES
cost_try = c(0.1, 0.5, 1, 5, 10, 20, 50, 80, 100, 500)
gamma_try = c(0.005, 0.01, 0.02, 0.05, 0.1, 0.5, 1, 2, 5, 10)

#BEST COST AND GAMMA VALUES SELECTED ACCORDING TO ACCURACY
max_accur = 0
best_cost = 1
best_gamma = 1
for (i in 1:10)
{
  for (j in 1:10)
  {
    svm_model <- svm(x = TRAIN_conv_x, y = TRAIN_y, gamma = gamma_try[j], cost = cost_try[i])
    svm_res <- predict(svm_model, VALIDATION_conv_x)
    conf_res <- confusionMatrix(svm_res, VALIDATION_y)
    
    if (max_accur < conf_res$overall[1])
    {
      max_accur = conf_res$overall[1]
      best_cost = cost_try[i]
      best_gamma = gamma_try[j]
      print(conf_res$overall[1])
    }
  }
}
##  Accuracy 
## 0.6764706 
##  Accuracy 
## 0.7058824 
##  Accuracy 
## 0.7647059
#BEST VALUES PRINTED
print(best_cost)
## [1] 5
print(best_gamma)
## [1] 0.01
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
svm_model <- svm(x = TRAIN_conv_x, y = TRAIN_y, gamma = best_gamma, cost = best_cost)
svm_res <- predict(svm_model, TEST_conv_x)
conf_res <- confusionMatrix(svm_res, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  2
##          1  5  2
##          2 17 44
##                                           
##                Accuracy : 0.7206          
##                  95% CI : (0.5985, 0.8227)
##     No Information Rate : 0.6765          
##     P-Value [Acc > NIR] : 0.261543        
##                                           
##                   Kappa : 0.2236          
##  Mcnemar's Test P-Value : 0.001319        
##                                           
##             Sensitivity : 0.22727         
##             Specificity : 0.95652         
##          Pos Pred Value : 0.71429         
##          Neg Pred Value : 0.72131         
##              Prevalence : 0.32353         
##          Detection Rate : 0.07353         
##    Detection Prevalence : 0.10294         
##       Balanced Accuracy : 0.59190         
##                                           
##        'Positive' Class : 1               
## 

Justification for Model Parameters

3.3. KNN

#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]

TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)

#BEST K VALUE IS SELECTED ACCORDING TO ACCURACY
max_accur = 0
best_k_val = 1
for (i in 1:100)
{
  test_pred <- knn(train = TRAIN_conv_x, test = VALIDATION_conv_x, cl = TRAIN_y, k=i)
  conf_res <- confusionMatrix(test_pred, VALIDATION_y)
  
  if (max_accur < conf_res$overall[1])
  {
    max_accur = conf_res$overall[1]
    best_k_val = i
    print(conf_res$overall[1])
  }
}
##  Accuracy 
## 0.7058824
#BEST VALUES PRINTED
print(best_k_val)
## [1] 1
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
test_pred <- knn(train = TRAIN_conv_x, test = TEST_conv_x, cl = TRAIN_y, k=best_k_val)
conf_res <- confusionMatrix(test_pred, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  2
##          1  8  7
##          2 14 39
##                                           
##                Accuracy : 0.6912          
##                  95% CI : (0.5674, 0.7976)
##     No Information Rate : 0.6765          
##     P-Value [Acc > NIR] : 0.4545          
##                                           
##                   Kappa : 0.2306          
##  Mcnemar's Test P-Value : 0.1904          
##                                           
##             Sensitivity : 0.3636          
##             Specificity : 0.8478          
##          Pos Pred Value : 0.5333          
##          Neg Pred Value : 0.7358          
##              Prevalence : 0.3235          
##          Detection Rate : 0.1176          
##    Detection Prevalence : 0.2206          
##       Balanced Accuracy : 0.6057          
##                                           
##        'Positive' Class : 1               
## 

Justification for Model Parameters

3.4. Naive Bayesian

#SEPARATE TEST
TEST_conv_x <- ALL_DATA_x[274:341,]
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)

#VALIDATION COMBINED WITH TRAIN
TV_conv_x <- ALL_DATA_x[1:273,]
TV_y <- c(TRAIN_y, VALIDATION_y)
TV_y <- as.factor(TV_y)

#BECAUSE OF NO PARAMETER SELECTION, NB APPLIED DIRECTLY
nb_model <- naiveBayes(x = TV_conv_x, y = TV_y, laplace = laplace)
nb_res <- predict(nb_model, TEST_conv_x)
conf_res <- confusionMatrix(nb_res, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  2
##          1 15 28
##          2  7 18
##                                           
##                Accuracy : 0.4853          
##                  95% CI : (0.3622, 0.6097)
##     No Information Rate : 0.6765          
##     P-Value [Acc > NIR] : 0.9996453       
##                                           
##                   Kappa : 0.0585          
##  Mcnemar's Test P-Value : 0.0007232       
##                                           
##             Sensitivity : 0.6818          
##             Specificity : 0.3913          
##          Pos Pred Value : 0.3488          
##          Neg Pred Value : 0.7200          
##              Prevalence : 0.3235          
##          Detection Rate : 0.2206          
##    Detection Prevalence : 0.6324          
##       Balanced Accuracy : 0.5366          
##                                           
##        'Positive' Class : 1               
## 

Justification for Model Parameters

3.5. Random Forest

#SEPARATE TRAIN, VALIDATION AND TEST
TRAIN_conv_x <- ALL_DATA_x[1:239,]
VALIDATION_conv_x <- ALL_DATA_x[240:273,]
TEST_conv_x <- ALL_DATA_x[274:341,]

TRAIN_y <- TRAIN[,36]
TRAIN_y <- as.factor(TRAIN_y)
VALIDATION_y <- VALIDATION[,36]
VALIDATION_y <- as.factor(VALIDATION_y)
TEST_y <- TEST[,36]
TEST_y <- as.factor(TEST_y)

#BEST NTREE VALUE IS SELECTED ACCORDING TO ACCURACY
max_accur = 0
res_num_of_tree = 0
num_of_tree = 16
for (i in 1:7)
{
  set.seed(97)
  
  rf_model <- randomForest(x = TRAIN_conv_x, y = TRAIN_y, ntree = num_of_tree)
  rf_res <- predict(rf_model, VALIDATION_conv_x)
  rf_res_round <- as.factor(round(as.numeric(rf_res)))
  conf_res <- confusionMatrix(rf_res_round, VALIDATION_y)
  
  if (conf_res$overall[1] > max_accur)
  {
    max_accur = conf_res$overall[1]
    res_num_of_tree = num_of_tree
    print(conf_res$overall[1])
  }
  
  num_of_tree = num_of_tree*2
}
##  Accuracy 
## 0.5882353 
##  Accuracy 
## 0.6470588 
##  Accuracy 
## 0.6764706
#BEST VALUES PRINTED
print(res_num_of_tree)
## [1] 64
#TEST DATASET IS PREDICTED AND RESULTS ARE DISPLAYED
set.seed(97)
rf_res_model <- randomForest(x = TRAIN_conv_x, y = TRAIN_y, ntree = res_num_of_tree)
rf_res <- predict(rf_model, TEST_conv_x)
rf_res_round <- as.factor(round(as.numeric(rf_res)))
conf_res <- confusionMatrix(rf_res_round, TEST_y)
print(conf_res)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  1  2
##          1  4  2
##          2 18 44
##                                           
##                Accuracy : 0.7059          
##                  95% CI : (0.5829, 0.8102)
##     No Information Rate : 0.6765          
##     P-Value [Acc > NIR] : 0.3537252       
##                                           
##                   Kappa : 0.1707          
##  Mcnemar's Test P-Value : 0.0007962       
##                                           
##             Sensitivity : 0.18182         
##             Specificity : 0.95652         
##          Pos Pred Value : 0.66667         
##          Neg Pred Value : 0.70968         
##              Prevalence : 0.32353         
##          Detection Rate : 0.05882         
##    Detection Prevalence : 0.08824         
##       Balanced Accuracy : 0.56917         
##                                           
##        'Positive' Class : 1               
## 

Justification for Model Parameters

3.6. ANN

Justification for Model Parameters

4. Clustering Methods

4.1. K-means

# K-means on training Data
X = ALL_DATA_x

# Using the elbow method to find optimal number of clusters
# Applying k-means to the dataset
set.seed(13)
kmeans = kmeans(X, 10, iter.max = 500)

# Visualizing library
# install.packages("cluster")
library(cluster)
clusplot(X,
         kmeans$cluster,
         lines = 0, # no line wanted
         shade = TRUE, # shade depending on the denstiy
         color = TRUE,
         labels = 0,
         plotchar = FALSE,
         span = TRUE,
         main = paste("Clusters of Data"),
         xlab = "x-axis",
         ylab = "y-axis")

Justification for K-means Parameters

Initial configuration is fixed. We will run k-means for k = 1:10. vi. Plot error vs k to find optimal number of clusters by using the elbow method.

set.seed(123) 
wcss = vector() # an empty vector
for (i in 1:50) wcss[i] = sum(kmeans(X, i)$withinss)
plot(1:50, wcss, type = "b", main = paste("Clusters"), xlab = "# Clusters", ylab = "Within Cluster SS")

4.2. Hierarchical Clustering

In this section, we also apply hiearchical clustering. In order to understand with linkages work best for the well seperated data, we plot their dendrogram in a for loop.

As seen from the dendrograms, the best seperation is obtained when warD is used.

# 2.1. H-clust with different linkages
X = ALL_DATA_x
dend = list(list(),list(),list())
meth = c("ward.D", "single", "average")
names(dend) = meth
# Using dendrogram to find the opt num of clusters
for (i in 1:3) {
  dend[i] = list(hclust(dist(X, method = "euclidean"), method = meth[i])) #dist.method: euc #agglom.method: ward
  plot(dend[[i]],
       main = paste("Dendrogram using", meth[i], sep = " " ), # title
       xlab = "Points",
       ylab = paste("Euclidean", "Distance", sep = " ")
  )
}

# Fitting hierarchical clustering to the mall dataset with k = 4 (found using dendrogram)
numClus = 5
hc = hclust(dist(X, method = "euclidean"), method = "ward.D") # same function with different var.name
y_hc = cutree(hc, k = numClus) # cut tree where num.groups is 4

# Visualizing the clusters
# install.packages("cluster")
library(cluster)
clusplot(X[1:2],
         y_hc,
         lines = 0, # cluster merkezleri arasi ?izgi
         shade = TRUE,
         color = TRUE,
         labels = 1, # 1: labellanacak noktalari secip goster 2: hepsini goster
         plotchar = FALSE,
         span = TRUE, # cluster icini tarama
         main = paste("Clusters of Well Seperated Data using ward.D"),
         xlab = "X1",
         ylab = "X2")

clus_size = c(0,0,0)
for (i in 1:length(y_hc)) clus_size[y_hc[i]] = clus_size[y_hc[i]]+1 
show(clus_size)
## [1] 210  45  34  NA  NA

Justification for H-clustering Parameters

For H-clustering parameters, we first plot the dendogram of the clusters. On this dendogram, we see the separation distance (length) of the linkages. Then, we find the cluster numbers by cutting the tree at maximum length point.as Fitting hierarchical clustering to the mall dataset with k = 5 (found using dendrogram)

5. Comparison for Classification Models

6. Comparison for Clustering Models

Now, we compare our clustering models using wcss analysis. wcss is a vector of within-cluster sum of squares, one component per cluster. To do this, we begin with an empy wcss vectors and we calculate and sum within ss values of clusters by running the model with 100 different initial configurations.. We can view the sum of within cluster sum of squares error and look at indices with minimum error.

wcss_k = vector() # an empty vector
for (i in 1:100) {
  set.seed(i*20)
  wcss[i] = sum(kmeans(X, 10)$tot.withinss)
} 
plot(20*(1:100), wcss, type = "b", main = paste("Clusters"), xlab = "Initial Seed", ylab = "Within Cluster SS")

which(wcss == min(wcss)) # initial conditions with minimum error
## [1] 81
insens_init = length(which(wcss == min(wcss)))/100
insens_init
## [1] 0.01

In the above analysis, we created kmeans models with different k values (from k=2 to k=10) and initialize them from different initialization points by manipulating the random seed. Then, we sum wcss for each time and compare them against to find insensitivity to initialization point.

In our analysis, we have observed that increasing k-value significantly

7. Conclusions

8. Self Reflectance

8. References

  1. The ECC paper
  2. stackoverflow.com
  3. r-bloggers.com
  4. analyticsvidhya.com